The Multistage Forecast Model

This is a tutorial for the Multistage Forecast model. Multistage Forecast is a fast solution designed for more granular time series (for example, minute-level), where a long history is needed to train a good model.

For example, suppose we want to train a model on 2 years of 5-minute frequency data. That’s 210,240 observations. If we directly fit a model to large input data, training time and resource demand can be high (15+ minutes on i9 CPU). If we use a shorter period to train the model, the model will not be able to capture long term effects such as holidays, monthly/quarterly seasonalities, year-end drops, etc. There is a trade-off between speed and accuracy.

On the other hand, if due to data retention policy, we only have data in the original frequency for a short history, but we have aggregated data for a longer history, could we utilize both datasets to make the prediction more accurate?

Multistage Forecast is designed to close this gap. It’s easy to observe the following facts:

  • Trend can be learned with data at a weekly/daily granularity.

  • Yearly seasonality, weekly seasonality and holiday effects can be learned with daily data.

  • Daily seasonality and autoregression effects can be learned with most recent data if the forecast horizon is small (which is usually the case in minute-level data).

Then it’s natural to think of the idea: not all components in the forecast model need to be learned from minute-level granularity. Training each component with the least granularity data needed can greatly save time while keeping the desired accuracy.

Here we introduce the Multistage Forecast algorithm, which is built upon the idea above:

  • Multistage Forecast trains multiple models to fit a time series.

  • Each stage of the model trains on the residuals of the previous stages, takes an appropriate length of data, does an optional aggregation, and learns the appropriate components for the granularity.

  • The final predictions will be the sum of the predictions from all stages of models.

In practice, we’ve found Multistage Forecast to reduce training time by up to 10X while maintaining accuracy, compared to a Silverkite model trained on the full dataset.

A diagram of the Multistage Forecast model flow is shown below.

Multistage Forecast training flow

Next, we will see examples of how to configure Multistage Forecast models.

55 # import libraries
56 import plotly
57 from greykite.framework.templates.forecaster import Forecaster
58 from greykite.framework.templates.autogen.forecast_config import ForecastConfig,\
59     MetadataParam, ModelComponentsParam, EvaluationPeriodParam
60 from greykite.framework.templates.model_templates import ModelTemplateEnum
61 from greykite.framework.benchmark.data_loader_ts import DataLoaderTS
62 from greykite.algo.forecast.silverkite.forecast_simple_silverkite_helper import cols_interact
63 from greykite.framework.templates.multistage_forecast_template_config import MultistageForecastTemplateConfig

Configuring the Multistage Forecast model

We take an hourly dataset as an example. We will use the hourly Washington D.C. bikesharing dataset (source).

73 # loads the dataset
74 ts = DataLoaderTS().load_bikesharing_ts()
75 print(ts.df.head())
76
77 # plot the data
78 plotly.io.show(ts.plot())

Out:

                                     ts        date   y  tmin  tmax   pn
2010-09-20 12:00:00 2010-09-20 12:00:00  2010-09-20  11  12.8  25.6  0.0
2010-09-20 13:00:00 2010-09-20 13:00:00  2010-09-20  13  12.8  25.6  0.0
2010-09-20 14:00:00 2010-09-20 14:00:00  2010-09-20   8  12.8  25.6  0.0
2010-09-20 15:00:00 2010-09-20 15:00:00  2010-09-20   8  12.8  25.6  0.0
2010-09-20 16:00:00 2010-09-20 16:00:00  2010-09-20  12  12.8  25.6  0.0

The data contains a few years of hourly data. Directly training on the entire dataset may take a couple of minutes. Now let’s consider a two-stage model with the following configuration:

  • Daily model: a model trained on 2 years of data with daily aggregation. The model will learn the trend, yearly seasonality, weekly seasonality and holidays. For an explanation of the configuration below, see the paper.

  • Hourly model: a model trained on the residuals to learn short term patterns. The model will learn daily seasonality, its interaction with the is_weekend indicator, and some autoregression effects.

From Tune your first forecast model we know how to specify each single model above. The core configuration is specified via ModelComponentsParam. We can specify the two models as follows.

 97 # the daily model
 98 daily_model_components = ModelComponentsParam(
 99     growth=dict(
100         growth_term="linear"
101     ),
102     seasonality=dict(
103         yearly_seasonality=12,
104         quarterly_seasonality=0,
105         monthly_seasonality=0,
106         weekly_seasonality=5,
107         daily_seasonality=0  # daily model does not have daily seasonality
108     ),
109     changepoints=dict(
110         changepoints_dict=dict(
111             method="auto",
112             regularization_strength=0.5,
113             yearly_seasonality_order=12,
114             resample_freq="3D",
115             potential_changepoint_distance="30D",
116             no_changepoint_distance_from_end="30D"
117         ),
118         seasonality_changepoints_dict=None
119     ),
120     autoregression=dict(
121         autoreg_dict="auto"
122     ),
123     events=dict(
124         holidays_to_model_separately=["Christmas Day", "New Year's Day", "Independence Day", "Thanksgiving"],
125         holiday_lookup_countries=["UnitedStates"],
126         holiday_pre_num_days=1,
127         holiday_post_num_days=1
128     ),
129     custom=dict(
130         fit_algorithm_dict=dict(
131             fit_algorithm="ridge"
132         ),
133         feature_sets_enabled="auto",
134         min_admissible_value=0
135     )
136 )
137
138 # creates daily seasonality interaction with is_weekend
139 daily_interaction = cols_interact(
140     static_col="is_weekend",
141     fs_name="tod_daily",
142     fs_order=5
143 )
144
145 # the hourly model
146 hourly_model_components = ModelComponentsParam(
147     growth=dict(
148         growth_term=None  # growth is already modeled in daily model
149     ),
150     seasonality=dict(
151         yearly_seasonality=0,
152         quarterly_seasonality=0,
153         monthly_seasonality=0,
154         weekly_seasonality=0,
155         daily_seasonality=12  # hourly model has daily seasonality
156     ),
157     changepoints=dict(
158         changepoints_dict=None,
159         seasonality_changepoints_dict=None
160     ),
161     events=dict(
162         holidays_to_model_separately=None,
163         holiday_lookup_countries=[],
164         holiday_pre_num_days=0,
165         holiday_post_num_days=0
166     ),
167     autoregression=dict(
168       autoreg_dict="auto"
169     ),
170     custom=dict(
171         fit_algorithm_dict=dict(
172             fit_algorithm="ridge"
173         ),
174         feature_sets_enabled="auto",
175         extra_pred_cols=daily_interaction
176     )
177 )

Now to use Multistage Forecast, just like specifying the model components of the Simple Silverkite model, we need to specify the model components for Multistage Forecast. The Multistage Forecast configuration is specified via ModelComponentsParam.custom["multistage_forecast_configs"], which takes a list of MultistageForecastTemplateConfig objects, each of which represents a stage of the model.

The MultistageForecastTemplateConfig object for a single stage takes the following parameters:

  • train_length: the length of training data, for example "365D". Looks back from the end of the training data and takes observations up to this limit.

  • fit_length: the length of data where fitted values are calculated. Even if the training data is not the entire period, the fitted values can still be calculated on the entire period. The default will be the same as the training length.

  • agg_freq: the aggregation frequency in string representation. For example, “D”, “H”, etc. If not specified, the original frequency will be kept.

  • agg_func: the aggregation function name, default is "nanmean".

  • model_template: the model template name. This together with the model_components below specify the full model, just as when using the Simple Silverkite model.

  • model_components: the model components. This together with the model_template above specify the full model for a stage, just as when using the Simple Silverkite model.

MultistageForecastTemplateConfig represents the flow of each stage of the model: taking the time series / residual, taking the appropriate length of training data, doing an optional aggregation, then training the model with the given parameters. Now let’s define the MultistageForecastTemplateConfig object one by one.

210 # the daily model
211 daily_config = MultistageForecastTemplateConfig(
212     train_length="730D",                               # use 2 years of data to train
213     fit_length=None,                                   # fit on the same period as training
214     agg_func="nanmean",                                # aggregation function is nanmean
215     agg_freq="D",                                      # aggregation frequency is daily
216     model_template=ModelTemplateEnum.SILVERKITE.name,  # the model template
217     model_components=daily_model_components            # the daily model components specified above
218 )
219
220 # the hourly model
221 hourly_config = MultistageForecastTemplateConfig(
222     train_length="30D",                                # use 30 days data to train
223     fit_length=None,                                   # fit on the same period as training
224     agg_func="nanmean",                                # aggregation function is nanmean
225     agg_freq=None,                                     # None means no aggregation
226     model_template=ModelTemplateEnum.SILVERKITE.name,  # the model template
227     model_components=hourly_model_components           # the daily model components specified above
228 )

The configurations simply go to ModelComponentsParam.custom["multistage_forecast_configs"] as a list. We can specify the model components for Multistage Forecast as below. Note that all keys other than "custom" and "uncertainty" will be ignored.

235 model_components = ModelComponentsParam(
236     custom=dict(
237         multistage_forecast_configs=[daily_config, hourly_config]
238     ),
239     uncertainty=dict()
240 )

Now we can fill in other parameters needed by ForecastConfig.

246 # metadata
247 metadata = MetadataParam(
248     time_col="ts",
249     value_col="y",
250     freq="H"  # the frequency should match the original data frequency
251 )
252
253 # evaluation period
254 evaluation_period = EvaluationPeriodParam(
255     cv_max_splits=0,  # turn off cv for speeding up
256     test_horizon=0,  # turn off test for speeding up
257 )
258
259 # forecast config
260 config = ForecastConfig(
261     model_template=ModelTemplateEnum.MULTISTAGE_EMPTY.name,
262     forecast_horizon=24,  # forecast 1 day ahead
263     coverage=0.95,  # prediction interval is supported
264     metadata_param=metadata,
265     model_components_param=model_components,
266     evaluation_period_param=evaluation_period
267 )
268 forecaster = Forecaster()
269 forecast_result = forecaster.run_forecast_config(
270     df=ts.df,
271     config=config
272 )
273
274 print(forecast_result.forecast.df_test.head())
275
276 # plot the predictions
277 fig = forecast_result.forecast.plot()
278 # interactive plot, click to zoom in
279 plotly.io.show(fig)

Out:

                       ts  actual    forecast  forecast_lower  forecast_upper
78421 2019-09-01 01:00:00     NaN  214.459870       63.241347      365.678394
78422 2019-09-01 02:00:00     NaN  131.300281      -19.918242      282.518804
78423 2019-09-01 03:00:00     NaN   88.781467      -62.437057      239.999990
78424 2019-09-01 04:00:00     NaN   41.619714     -109.598809      192.838238
78425 2019-09-01 05:00:00     NaN   23.770384     -127.448139      174.988907

This model is 3X times faster than training with Silverkite on the entire hourly data (23.5 seconds vs 79.4 seconds). If speed is a concern due to high frequency data with long history, Multistage Forecast is worth trying.

Note

The order of specifying the MultistageForecastTemplateConfig objects does not matter. The models will be automatically sorted with respect to train_length from long to short. This is to ensure that we have enough residuals from the previous model when we fit the next model.

Note

The estimator expects different stage models to have different aggregation frequencies. If two stages have the same aggregation frequency, an error will be raised.

Note

Since the models in each stage may not fit on the entire training data, there could be periods at the beginning of the training period where fitted values are not calculated. These NA fitted values are ignored when computing evaluation metrics on the training set.

The uncertainty configuration

If you would like to include the uncertainty intervals, you can specify the "uncertainty" parameter in model components.

The "uncertainty" key in ModelComponentsParam takes one key: "uncertainty_dict", which is a dictionary taking the following keys:

  • "uncertainty_method": a string representing the uncertainty method, for example, "simple_conditional_residuals".

  • "params": a dictionary of additional parameter needed by the uncertainty method.

Now let’s specify a configuration of uncertainty method via the uncertainty_dict parameter on the "simple_conditional_residuals" model.

320 # specifies the ``uncertainty`` parameter
321 uncertainty = dict(
322     uncertainty_dict=dict(
323         uncertainty_method="simple_conditional_residuals",
324         params=dict(
325             conditional_cols=["dow"]  # conditioning on day of week
326         )
327     )
328 )
329
330 # adds to the ``ModelComponentsParam``
331 model_components = ModelComponentsParam(
332     custom=dict(
333         multistage_forecast_configs=[daily_config, hourly_config]
334     ),
335     uncertainty=uncertainty
336 )

The Multistage Forecast Model templates

In the example above we have seen an model template named MULTISTAGE_EMPTY. The template is an empty template that must be used with specified model components. Any model components (multistage_forecast_configs) specified will be exactly the model parameters to be used. Template Overview explains how model templates work and how they are overridden by model components.

The Multistage Forecast model also comes with the following model template:

  • SILVERKITE_TWO_STAGE: a two-stage model similar to the model we present above. The first stage is a daily model trained on 56 * 7 days of data learning the long term effects including yearly/quarterly/monthly/weekly seasonality, holidays, etc. The second stage is a short term model in the original data frequency learning the daily seasonality and autoregression effects. Both stages’ model_templates are SILVERKITE. Note that this template assumes the data to be sub-daily.

When you choose to use the Multistage Forecast model templates, you can override default values by specifying the model components. The overriding in Multistage Forecast works as follows:

  • For each MultistageForecastTemplateConfig’s overridden, there are two situations.

    If the customized model_template is the same as the model_template in the default model, for example, both are SILVERKITE, then the customized model_components in the MultistageForecastTemplateConfig will be used to override the model_components in the default MultistageForecastTemplateConfig, as overriding is done in the Silverkite template.

    If the model templates are different, say SILVERKITE in the default and SILVERKITE_EMPTY in the customized, then both the new model_template and the new entire model_components will be used to replace the original model_template and model_components in the default model.

    In both cases, the train_length, fit_length, agg_func and agg_freq will be overridden.

For example, in SILVERKITE_TWO_STAGE, both stages of default templates are SILVERKITE. Consider the following example.

380 model_template = "SILVERKITE_TWO_STAGE"
381 model_components_override = ModelComponentsParam(
382     custom=dict(
383         multistage_forecast_configs=[
384             MultistageForecastTemplateConfig(
385                 train_length="730D",
386                 fit_length=None,
387                 agg_func="nanmean",
388                 agg_freq="D",
389                 model_template=ModelTemplateEnum.SILVERKITE.name,
390                 model_components=ModelComponentsParam(
391                     seasonality=dict(
392                         weekly_seasonality=7
393                     )
394                 )
395             ),
396             MultistageForecastTemplateConfig(
397                 train_length="30D",
398                 fit_length=None,
399                 agg_func="nanmean",
400                 agg_freq=None,
401                 model_template=ModelTemplateEnum.SILVERKITE_EMPTY.name,
402                 model_components=ModelComponentsParam(
403                     seasonality=dict(
404                         daily_seasonality=10
405                     )
406                 )
407             )
408         ]
409     )
410 )

The first model has the same model template SILVERKITE as the default model template, so in model_components, only the weekly seasonality parameter will be used to override the default weekly seasonality in SILVERKITE model template. The second model has a different model template SILVERKITE_EMPTY. Then the second model will use exactly the model template and model components specified in the customized parameters.

This design is to maximize the flexibility to override an existing Multistage Forecast model template. However, if you fully know what your configuration will be for each stage of the model, the suggestion is to use MULTISTAGE_EMPTY and specify your own configurations.

Note

If the customized model components contains fewer models than provided by the model template, for example, only 1 stage model is customized when using SILVERKITE_TWO_STAGE. The ` customized MultistageForecastTemplateConfig will be used to override the first model in the SILVERKITE_TWO_STAGE, and the 2nd model in SILVERKITE_TWO_STAGE will be appended to the end of the first overridden model. Oppositely, if the number of customized models is 3, the extra customized model will be appended to the end of the 2 models in SILVERKITE_TWO_STAGE.